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Abstract 

Domain adaptation techniques aim at adapting a classifier learnt on a source do¬ 
main to work on the target domain. Exploiting the subspaces spanned by features 
of the source and target domains respectively is one approach that has been inves¬ 
tigated towards solving this problem. These techniques normally assume the ex¬ 
istence of a single subspace for the entire source / target domain. In this work, we 
consider the hierarchical organization of the data and consider multiple subspaces 
for the source and target domain based on the hierarchy. We evaluate different 
subspace based domain adaptation techniques under this setting and observe that 
using different subspaces based on the hierarchy yields consistent improvement 
over a non-hierarchical baseline. 

1 Introduction 

While evaluating unseen test instances on a classifier trained over a set of labelled training instances, 
there is a standard assumption that test instances and training instances follow the same distribution. 
However, many real world scenarios violate this assumption. Think of a case where someone wants 
to classify the images taken with his low quality phone camera for which he doesn’t have labels 
available. Can the person classify those images using the classifier which was trained on some 
publicly available dataset like ImageNet or Elickr ? The obvious answer is no. Many studies have 
shown that if the test instances are not sampled from the same distribution as the training instances 
then the performance of the classifier significantly diminishes EiiiKia. This problem of domain 
shift is also extensively studied in the field of natural language processing and speech processing 
To address this challenge, methods have been suggested to adapt a domain (Source Domain) 
with respect to the other domain (Target Domain) so that a classifier trained on Source Domain data 
also contains the property of Target Domain data. One can distinguish two settings in the domain 
adaptation literature: (1) the unsupervised setting when the target domain is completely unlabeled 
and (2) the semi-supervised setting when the target domain is partially labeled. In both settings, the 
source domain is fully labelled. In this work we focus on the unsupervised setting that is more chal¬ 
lenging one. A promising line of work to solve this problem is by subspace based domain adaptation 
EElEl. However, none of the above approaches takes the semantic (dis)similarity of the category 
classes into account. Classes which are semantically similar have a very different distribution than 
classes which are semantically different. Based on this observation, we advocate that it’s better to 
align the subspaces separately rather than considering the whole target data distribution at once. 

To address this challenge we propose a new method of step-wise subspace alignment for domain 
adaptation. Step-wise subspace alignment here indicates that we first align the subspaces for a set 
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of larger clusters or group of semantically similar categories and then for the categories within the 
clusters. 

We evaluate the effectiveness of proposed approach on a standard dataset having classes arranged 
according to their semantics in the hierarchy. However, the proposed approach can also be effective 
for the case when the hierarchy is not available. In such scenario similar categories can be clustered 
together in unsupervised way. 

2 Related Work 

As mentioned in section domain adaptation is widely studied in many fields including Natural 
Language Processing, Speech Processing and Computer Vision ElISlIll. A survey on recent ad¬ 
vances in domain adaptation in natural language processing and computer vision can be found in 
(mini. Subspace based approaches are most popular for solving the visual domain shift problem 
irnaia- The same principal lies behind these approaches. They first determine separate subspaces 
for source and target data and then project the data onto these subspaces and/or a set of intermediate 
sampled subspaces with the aim of making the feature point domain invariant. In m, a method is 
proposed to sample subspaces along the geodesic between source and target subspace on the Grass- 
mann manifold. Once sampling is done then features are projected onto those sampled subspaces 
and a classifier is trained on the projected features. In O, the geodesic flow kernel is proposed to 
capture the incremental details in subspaces between source and target subspace along the geodesic. 
Instead of using intermediate subspaces. 111 proposes to learn a transformation to directly align the 
source subspace to the target subspace. 

Only few works have looked at the use of hierarchies in the context of domain adaptation. In ITOl . 
Nguyen et al. propose to adapt a hierarchy of features to exploit the richness of visual data. The 
intent behind this work is similar to our work, in that semantic closeness and context information 
are exploited to boost domain adaptation performance. Taking this idea forward a recent work 
on hierarchical adaptive structural SVM for domain adaptation has been proposed in ifTTl . They 
organize multiple target domains into a hierarchical structure (tree) and adapt the source model to 
them jointly. Others have used statistical methods for hierarchical domain adaptation, e.g. in C2 a 
hierarchical Bayesian prior is used to solve the domain shift problem in natural language and speech 
processing. However, the previous works have assumed a single common subspace between source 
and target, while our approach makes use of the hierarchical structure among the different classes to 
learn separate subspaces. 

3 Background 

The proposed approach builds up on the previously proposed subspace based methods dElll. One 
could learn the domain shift between source and target data on the original features itself. However 
this would be sub-optimal and involve significantly modifying the classifiers. Therefore, it is more 
common to learn it on a more robust representation of the data by first selecting d dominating 
eigenvectors obtained using principal component analysis. These d eigenvectors work as the basis 
vectors for the source and target subspaces. The source and target features are then projected on the 
subspaces. Two recently proposed state-of-the-art subspace based domain adaptation methods CHS 
used in this paper are discussed in |3.1| and |3.2| 

3.1 Subspace Alignment 

Subspace alignment based domain adaptation method consists of learning a transformation matrix 
M that maps the source subspace to the target one Ill . The mathematical formulation to this problem 
is given by 

F{M) = \\XsM - Xrfp M* = argmin(F(M)). (1) 

M 

Xs and Xt are matrices containing the d most important eigenvectors for source and target respec¬ 
tively. M is a transformation matrix from the source subspace Xs to target subspace Xt and ||.||i7 
is the Frobenius norm. The solution of eq. [^is M* = X^Xt and hence for the target aligned source 
coordinate system we get Xa = XsX'^Xt. 

3.2 Geodesic Flow Kernel 

The geodesic fiow kernel based domain adaptation method constructs an infinite-dimensional feature 
space that carries the information of incremental change from source to target domain data IS . A key 
step in this method is to determine the geodesic curve between the two subspaces and to construct 
the geodesic fiow kernel. If and Xt are source and target subspaces having the same dimension 
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then these two subspaces are separate points on a Grassmann manifold which is also a Riemannian 
Manifold. Let Rs be the orthogonal complement to Xs. From the property of Riemannian Manifold, 
flow from Xs towards Xt can be calculated as: 

= XsUiVit) - RsU 2 T,{t) where XjXr = UiTV'^, R^Xt = -U^'R.V'^■ 


Based on the decomposition of the source subspace Xs and its orthogonal complement Rs, we can 
obtain a geodesic flow kernel matrix G that is given by 

^ ^ UIPI 


\G\ = [ PsUi RsU2 

the principal angles. 


Ai 

Ao 


A 2 

A 3 


U^Rl 


where As are diagonal matrices that depend on 


Once we obtain the geodesic flow matrix G, we can relate labeled samples Xi from the source 
subspace Xi and unlabeled samples Xj from the target subspace Xj by using the distance metric 
xJ[G]xj. 

4 Our Approach 

In this section we describe how the methods explained in section are adapted for hierarchical 
domain adaptation. Instead of using the same subspace throughout, we postulate that better results 
can be obtained by using different subspaces for different levels of the hierarchy. Indeed, the more 
speciflc subspaces spanned by instances of categories of a certain branch of our tree (corresponding 
to similar categories), can be expected to better fit the data and therefore better model the domain 
shift. For the source domain, these subspaces can easily be obtained. For the target domain, however, 
no class labels are available as we are working in the unsupervised setting. Therefore, the exact 
subspaces cannot be computed. We circumvent this problem by first predicting the parent class 
label for each instance, using the global subspaces and applying domain adaptation at the level of 
the root node. We then use these predicted parent class labels to compute the next level of subspaces. 
This results in a two step approach, as summarized in the algorithm below. In hierarchical subspace 

Algorithm 1 Subspace Based Hierarchical Domain Adaptation 

1: procedure Hierarchical Domain ADAPTATiON(Source Data S,Target Data T) 

2: ^ PCA{S) and ^ PGA{T) 

3: SubspaceAlign{Xs,,,,, ) or GFK(Xs,,,,, Xt„„ ) 

4: ClassifyParent(Tsirget Data T) 

5: i i — 0 

6: while i < No. of Parents do 

7: Xsi ^ PCA{Si) > 5'iS are labeled data points (from parent) 

8: Xt^ ^ PCA{Ti) > T^s are data points classified as parent by the root classifier 

9: SubspaceAlign{Xs^ , Xt^ ) or GFK{Xs ^, Xt^ ) 

10: ClassifyChild(Tsirget Data T^) 

11: return Accuracy 


alignment we learn different metric M at different levels of hierarchy independently. Without loss 
of generality, we consider here hierarchies with only two levels, i.e. composed of a root node, a 
set of parent nodes (each corresponding to a set of similar categories) and a set of child nodes or 
leaf nodes (corresponding to the different categories). Hence the mathematical formulation of our 
approach is governed by eq. |^an dm 

FiMroot) = \\Xs„„Mr,,ot " Xt„„\\% = argmin(F(M™„*)) (2) 

^root 

\/i e parent, F(Mi) = \\Xs,Mi - Xt, ||^ M* = argmin(F(Mi)) (3) 

Mi 

Here M* ^ is the transformation matrix learned at the topmost level of hierarchy to differentiate 
between the parents. Each parent category consists of several similar child categories. Xs^^^t and 

Xt^ooi the source and target subspaces considering all the source and target data. M* is the 
transformation matrix learned at the second level of hierarchy to distinguish between the children of 
parent i. Xsi and are source and target subspaces for categories that belong to parent i. Hence 
and XTi are obtained using only the data points that belong to a child category of parent i. 
Solutions of the eq. [^andj^are similar to eq. 

In hierarchical geodesic flow kernel we compute different kernel matrices at different levels of hier¬ 
archy for categorization at a speciflc level. For classifying between the parent classes, kernel matrix 
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(a) Penguin cal- (b) Penguin bing (c) Bear caltech (d) Bear bing (e) Conch cal- (f) Conch bing 

tech 256 caltech 256 caltech tech 256 caltech 


Figure 1: Image samples taken from caltech-256 and bing caltech to show domain shift 


Groot is computed considering the source and target subspaces generated by all the data. For classi¬ 
fying between the children of a specific parent category i we compute kernel matrix Gi considering 
the subspace obtained from the children classes of parent i. Similarity between two data points 
depends on the hierarchy level at which prediction is performed. 

5 Experiments 

In this section we evaluate our results on a part of hierarchy taken from the caltech-256 and 
bing-caltech ifTTl [TSl. We show our experimental results on the animal hierarchy consisting of 
the following three parent nodes: aquatic, terrestrial and avian animals and each parent consists 
of several child categories. For each image in the dataset we compute 4096— dimensional 
convolutional neural network based features obtained using Decaf HTSl by first resizing the full 
image to the desired input size. Note that the dataset has not been augmented with any virtual 
examples by flipping or random cropping. In this paper we have used K-NN as our classifier as this 
has also been similarly used in Eia. The rank of the domain is decided using the procedure given 
in O and based on this procedure we fix the dimensionality of the subspaces for both root and 
parent subspaces as 53. We first show the result without applying any domain adaptation algorithm 
on the source and target data to show that there exists a non-negligible domain shift between these 
two datasets. This is shown in table in column “Base Accuracy”. The results in table show 
that the hierarchy based subspace alignment consistently improves the results. We also evaluate the 
similarity between subspaces of source and target domain by taking dot product (trace{A' * 5)) at 
various levels of hierarchy to analyse our approach. This result is provided in table As can be 
seen from the table the maximum similarity is observed to be between the relevant subspaces in 
source and target. The low values we obtain off the main diagonal indicate the subspaces for the 
different parent nodes are quite different from one another and different from the root subspace. 


Source Dataset 

Target Dataset 

Method 

Base 

Accuracy 

Accuracy 

(without Hierarchy) 

Accuracy 
(with Hierarchy) 

Caltech-256 

Caltech-Bing 

GFK 

24.41 

39.96 

40.41 

Caltech-Bing 

Caltech-256 

GFK 

21.11 

45.23 

49.67 

Caltech-256 

Caltech-Bing 

SA 

24.41 

39.24 

40.78 

Caltech-Bing 

Caltech-256 

SA 

21.11 

44.12 

48.36 


Table 1: Result for hierarchical domain adaptation on animal hierarchy of caltech-256 and bing- 
ca ltech. Here GFK represents Geodesic flow kernel and SA represents subspace alignment _ 



Root(Target) 

Avian(Target) 

Terrestrial(Target) 

Aquatic(Target) 

Root(Source) 

3.96 

1.35 

0.26 

2.05 

Avian(Source) 

3.11 

3.74 

-0.10 

0.18 

Terrestrial(Source) 

1.44 

2.62 

3.92 

0.76 

Aquatic(Source) 

1.22 

0.43 

1.42 

3.16 


Table 2: Similarity Matrix for Source Subspace and Target Subspace considering each Hierarchy 
level separately. Here caltech-256 is considered as source and bing-caltech as target. 

6 Conclusion 

In this paper, we have considered a hierarchical subspace based domain adaptation approach. Based 
on the evaluation we observe that considering different domain adaptation subspaces specific to the 
individual category level can indeed aid the domain adaptation. In future, we would like to evaluate 
the effect of restricting the subspaces to groups of classes which need not be obtained strictly based 
on hierarchy which would generalize the approach to any source and target domains that are not 
hierarchically labeled. 
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